Regeneration of Stochastic Processes: An Inverse Method 
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We propose a novel inverse method that utilizes a set of data to construct a simple equation that 
governs the stochastic process for which the data have been measured, hence enabling us to re- 
construct the stochastic process. As an example, we analyze the stochasticity in the beat-to-beat 
fluctuations in the heart rates of healthy subjects as well as those with congestive heart failure. The 
inverse method provides a novel technique for distinguishing the two classes of subjects in terms of a 
drift and a diffusion coefficients which behave completely differently for the two classes of subjects, 
hence potentially providing a novel diagnostic tool for distinguishing healthy subjects from those 
with congestive heart failure, even at the early stages of the disease development. 



1. Introduction 

Many natural or man-made phenomena, as well as the 
morphology of numerous physical systems, are charac- 
tertized by a degree of stochasticity. Turbulent flows, 
fluctuations in the stocks prices, seismic recordings, the 
internet traffic, pressure fluctuations in chemical reac- 
tors, and the surface roughness of many materials and 
rock [1,2] are but a few examples of such phenomena and 
systems. A long standing problem has been the devel- 
opment of an effective reconstruction method for such 
phenomena. That is, given a set of data for certain char- 
acteristics of a phenomenon, one would like to develop 
an effective equation that can reproduce the data with 
an accuracy comparable to the measured data. If such 
a method can be developed, one may utilize it to, (1) 
reconstruct the original process with similar statistical 
properties, and (2) understand the nature and properties 
of the stochastic process. 

In this paper we use a novel method to address this 
general problem. The proposed method utilizes a set 
of data for a phenomenon which contains a degree of 
stochasticity and constructs a simple equation that gov- 
erns the phenomenon. As wc show below, in addition 
to being highly accurate, the method is quite general; it 
is capable of providing a rational explanation for com- 
plex features of the phenomenon; it requires no scaling 
feature, and it enables us to accomplish the tasks listed 
above. As an example, we apply the method to analyze 
cardiac interbeat intervals which normally fluctuate in a 
complex manner. We show that the application of the 
method to the analysis of interbeat fluctuations in the 
heart rates may potentially lead to a novel method for 
distinguishing healthy subjects from those with conges- 
tive heart failure (CHF). 



2. The Method 

We begin by describing the steps that lead to the devel- 
opment of a stochastic equation, based on the (stochas- 
tic) data set, which is then utilized to reconstruct the 
original data, as well as an equation that describes the 
phenomenon. 

(1) As the first step we check whether the data fol- 
low a Markov chain and, if so, estimate the Markov time 
(length) scale tu- As is well-known, a given process with 
a degree of randomness or stochasticity may have a finite 
or an infinite Markov time (length) scale. The Markov 
time (length) scale is the minimum time interval over 
which the data can be considered as a Markov process 
[3-6]. To determine the Markov scale tM, we note that 
a complete characterization of the statistical properties 
of stochastic fluctuations of a quantity x in terms of a 
parameter t requires the evaluation of the joint probabil- 
ity distribution function (PDF) P n (x\, t\] ■ ■ • ; x n ,t n ) for 
an arbitrary n, the number of the data points. If the 
phenomenon is a Markov process, an important simpli- 
fication can be made, as the n-point joint PDF, P„, is 
generated by the product of the conditional probabili- 
ties p{xi+\, ti + \ \xi,ti), for i = 1, ■ • • , n — 1. A necessary 
condition for a stochastic phenomenon to be a Markov 
process is that the Chapman-Kolmogorov (CK) equation 
[7], 

p{x 2 ,t 2 \x 1 ,t 1 ) = J d(x 3 ) p(x 2 ,t 2 \x 3 ,t 3 ) p(x 3 ,t 3 \xi,ti) , 

(1) 

should hold for any value of t 3 in the interval t 2 < t 3 < t\ . 
One should check the validity of the CK equation for 





FIG. 1. Interbeats fluctuations of healthy subjects (top), 
and those with congestive heart failure (bottom). 



different xi by comparing the directly-evaluated condi- 
tional probability distributions p(x2, t2\xi, ti) with the 
ones calculated according to right side of Eq. (1). The 
simplest way to determine tu for stationary or homoge- 
neous data is the numerical calculation of the quantity, 
S = \p(x 2 ,t 2 \x 1 ,t 1 )-J dx 3 p(x 2l t 2 \x 3 ,t 3 )p(x 3 ,t 3 \x 1 ,t 1 )\, 
for given x\ and x 2 , in terms of, for example, t 3 — ti and 
considering the possible errors in estimating S. Then, 
tM = ta — 1\ for that value of t 3 — t\ for which S vanishes 
or is nearly zero (achieves a minimum). 

(2) Deriving an effective stochastic equation that de- 
scribes the fluctuations of the quantity x(t) constitutes 
the second step. The CK equation yields an evolu- 
tion equation for the change of the distribution func- 
tion P(x, t) across the scales t. The CK equation, when 
formulated in differential form, yields a master equation 
which takes the form of a Fokker-Planck equation: 



dt 



P(x,t) 



P(x,t) 



D^ 2 \x,t), are estimated directly from the data and the 
moments of the conditional probability distribu- 

tions: 



D {k) (x,t) = 1 UmAt^oM^, 



M {k) = ^ J dx{x - x) k p{x', t + At\x, t). (3) 

We note that this Fokker-Planck equation is equivalent 
to the following Langevin equation [8] : 



d_ 

dt 



x(t)=D^(x) + y/DW(x) /(f), 



(4) 



where f(t) is a random force with zero mean and Gaus- 
sian statistics, (^-correlated in t, i.e., (f(t)f(t')) = 26(t — 
t'). Note that such a reconstruction of a stochastic pro- 
cess does not imply that the data do not contain any 
correlations, or that the above formulation ignores the 
correlations. 

(3) Regeneration of the stochastic process constitutes 
the last step. Equation (4) enables us to regenerate a 
stochastic quantity which is similar to the original one 
in the statistical sense. The stochastic process is regen- 
erated by iterating Eq. (4) which yields a series of data 
without memory. To compare the regenerated data with 
the original ones, we must take the spatial (or temporal) 
interval in the numerical discretization of Eq. (4) to be 
unity (or renormalize it to unity). However, the Markov 
length or time is typically greater than unity. Therefore, 
we should correlate the data over the Markov length or 
time scale. There are a number of methods to corre- 
late the generated data in this interval [8-12]. Here, we 
propose a new technique which we refer to as the ker- 
nel method, according to which one considers a kernel 
function K(u) that satisfies the condition that, 



K(u)du = 1 , 



such that the data are determined by 



1 ™ 



t - u 



(5) 



(0) 



where h is the window width. For example, one of the 
most useful kernels is the standard normal density func- 
tion, K{u) = (27r) -1 / 2 exp(— \u 2 ). In essence, the kernel 
method represents the data as a sum of 'bumps' placed 
at the observation points, with its function determining 
the shape of the bumps, and its window width h fixing 
their width. It is evident that, over the scale h, the kernel 
method correlates the data to each other. 



(2) 



The drift and diffusion coefficients, D^(x,t) and 





FIG. 2. Test of Chapman-Kolmogorov equation for 
xi = —5, Xi = and x\ = 5. The bold and open sym- 
bols represent, respectively, the directly-evaluated PDF and 
the integrated PDF. The PDFs are shifted in the vertical di- 
rections for better presentation. Values of x are measured in 
units of the standard deviation. 





FIG. 3. The drift and diffusion coefficients D^(x) and 
D^(x), estimated by Eq. (3). For the healthy subjects 
(triangles) (x) and D^ 2 \x) follow linear and quadratic 
behavior in x, while for patients with CHF (squares) they 
follow third- and fourth-order behavior in x. 



FIG. 4. The curves show, from top to bottom, the actual 
interbeat data (for a healthy subject), the regenerated data 
using the corresponding Langevin equation, and the regener- 
ated data using the kernel method. The time series are shifted 
in the vertical directions for better presentation. 



3. Application to Fluctuations in Human Heart- 
beats 

We now apply the above method to reconstruction of 
the fluctuations in the human heartbeats of both healthy 
and ill subjects by taking h~tju- Recent studies [13-18] 
reveal that under normal conditions, beat-to-beat fluc- 
tuations in the heart rate might display extended corre- 
lations of the type typically exhibited by dynamical sys- 
tems far from equilibrium. It has been shown [14], for ex- 
ample, that the various stages of sleep may be character- 
ized by extended correlations of heart rates separated by 
a large number of beats. We show that the Markov time 
scale i&f, and the drift and diffusion coefficients of the 
interbeat fluctuations of healthy subjects and patients 
with congestive heart failure (CHF) have completely dif- 
ferent behaviour, when analyzed by the method we pro- 
pose in this paper, hence helping one to distinguish the 
two groups of the subjects. 

We analyze both daytime (12:00 pm to 18:00 pm) and 
nighttime (12:00 am to 6:00 am) heartbeat time series 
of healthy subjects, and the daytime records of patients 
with CHF. Our data base includes 10 healthy subjects (7 
females and 3 males with ages between 20 and 50, and 
an average age of 34.3 years), and 12 subjects with CHF, 
with 3 females and 9 males with ages between 22 and 71, 
and an average age of 60.8 years). Figure 1 presents the 
typical data. 

We first estimate the Markov time scale tyi of the data 
for the interbeat fluctuations. For the healthy subjects 
we find the Markov time scale for the daytime data to be 
(all the values are measured in units of the average time 
scale for the beat-to-beat times of each subject), £m = 
3, 3, 3, 1, 2, 3, 3, 2, 3 and 2. The corresponding results for 
the nighttime records are, tu are 3, 3, 1, 3, 3, 2, 3, 3, 2 and 



D (1 \x) = 



-0.0026a; - 0.0018a; 2 - 0.0007a; 3 , 
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FIG. 5. Logarithmic plot of the second moment of the 
height-difference versus m, for the actual data (circles) and 
the samples regenerated by the kernel method (squares). The 
corresponding time series are plotted in Fig. 4. 



3, respectively, comparable to those for the daytime. On 
the other hand, for the daytime records of the patients 
with CHF, the estimated Markov time scales are, = 
151,258,760,542,231,257,864,8,366,393,385, and 276. 
Therefore, the healthy subjects have tjn values that are 
much smaller than those of the patients with CHF, hence 
providing an unambiguous quantity for distinguishing the 
two. 

We then check the validity of the CK equation for 
several xi triplets by comparing the directly-evaluated 
conditional probability distributions p(x2, *2 1^1 3 ti) with 
the ones calculated according to right side of Eq. (1). 
Here, x is the interbeat and for all the samples we de- 
fine, x = (x — x)/a, where x and a are the mean and 
standard deviations of the intcrbeats data. In Figure 2, 
the two differently-computed PDFs are compared. As- 
suming the statistical errors to be the square root of the 
number of events in each bin, we find that the two PDFs 
are statistically identical. 

The corresponding drift and diffusion coefficients 
and D^ 2 \x) are displayed in Figure 3. We find 
that, in addition to the Markov time scale tu-, the two 
coefficients provide another important indicator for dis- 
tinguishing the ill from the healthy subjects: For the 
healthy subjects the drift D^ 1 ' and the diffusion coeffi- 
cient £)( 2 ) (x) are, respectively, a linear and a quadratic 
function of x, whereas the corresponding coefficients for 
patients with CHF follow a third- and fourth-order equa- 
tions in x. The analysis of the data yields the following 
approximants for the healthy subjects, 

L> (1) (a;) = -0.12a; , 

D {2) (x) = 0.05 - 0.042a; + 0.07a; 2 , (7) 
whereas for the patients with CHF we find that, 



D (2) (x) = 0.0006 - 0.0007a; + 0.0005a; 2 

+ 0.0003a; 3 + 0.0002a; 4 . (8) 

Equations (7) and (8) present the drift and diffusion 
coefficients for a typical healthy subject and one with 
CHF. We note that the final result for the Langevin equa- 
tion is the same as the results obtained in Rcf. [18]. For 
other data measured for other patients the functional de- 
pendence of £)W and D^ 2 \x) would be the same but 
with different numerical coefficients. The order of mag- 
nitude of the coefficients is the same for all the healthy 
subjects, and likewise for those with CHF (sec also Rcf. 
[19]). Moreover, if we analyze different parts of the time 
series separately, we find, (1) almost the same Markov 
time scale for different parts of the time series, but with 
some differences in the numerical values of the drift and 
diffusion coefficients, and (2) that the drift and diffusion 
coefficients for different parts of the time series have the 
same functional forms, but with different coefficients in 
equations such as (7) and (8). Hence, one can distin- 
guish the data for sleeping times from those for when the 
subjects are awake [20]. 

We also find another important difference between the 
heartbeat dynamics of the two classes of subjects: Com- 
pared with the healthy subjects, the drift and diffusion 
coefficients for the patients with CHF are very small (re- 
flecting, in some sense, the large Markov time scale £m)- 
Hence, we suggest that one may use the Markov time 
scales, the dependence of the drift and diffusion coeffi- 
cients on x, as well as their comparative magnitudes, for 
characterizing the dynamics of human heartbeats and 
their fluctuations, and to distinguish healthy subjects 
from those with CHF. To our knowledge, this proposal 
is novel. Given its relative simplicity, it would be most 
interesting to study whether this proposal can be devel- 
oped into a diagnostic tool for early detection of conges- 
tive heart failure. Work in this direction is in progress. 

We compare in Figure 4 the original time series x(n) 
with those reconstructed by the Langevin equation [by, 
for example, using Eqs. (4) and (7)] and the kernel 
method. While both methods generate scries that look 
similar to the original data, the kernel method appears 
to better mimic the behavior of the original data. To 
demonstrate the accuracy of Eq. (6), we compare in 
Figure 5 the second moment of the stochastic function, 
€2(171) = ([x(0) — x(m)] 2 ), for both the measured and 
reconstructed data using the kernel method. The agree- 
ment between the two is excellent. However, it is well- 
known that the agreement between the second moments 
of a stochastic time series and its reconstructed ver- 
sion is not sufficient for proving the accuracy of the re- 
construction method. Hence, we have also checked the 
accuracy of the higher-order structure function, S n = 



(\x(ti) — x(t 2 )\ n ) [21]. Wc find that the agreement be- 
tween S n for the original and reconstructed time series for 
n < 5 is excellent, while the difference between higher- 
order moments of the two times series, which are related 
to the tails of the PDF of the x— increments, increases. 



We would like to thank Armin Bundc for useful com- 
ments on the manuscript. 



4. Summary 



We have analyzed the interbeat fluctuations in the 
heart rates of healthy subjects, as well as those with 
congestive heart failure, by an inverse method for re- 
construction of the stochastic process that governs the 
fluctuations. The method, which is quite general and 
can regenerate a stochastic process with high precision, 
is based on utilizing measured data to estimate a drift 
and a diffusion coefficients to be used in a Fokker-Planck, 
or an equivalent Langevin, equation that describes the 
stochastic process. The analysis of the times series for hu- 
man heartbeat dynamics using the new method, for both 
healthy subjects and those with CHF, not only demon- 
strates the accuracy of the method, but also potentially 
provides a novel technique for distinguishing the heart- 
beat dynamics of the two classes of subjects. 

We should point out that Stanley and co-workers 
[13,15-17,20,21] analyze the type of data we considered 
in this paper by a method different from what we present 
in the present paper. Their analysis indicates that there 
may be long-range correlations in the data, which might 
be characterized by self-affine fractal distributions, such 
as the fractional Brownian motion or other types of 
stochastic processes that give rise to such correlations. 
They distinguish healthy subjects from those with CHF 
in terms of the type of correlations that might exist in the 
data (negative as opposed to positive correlations). The 
method proposed in the present paper is different from 
that of Stanley and co-workers in that, wc analyze the 
data in terms of Markov processes. Although our analy- 
sis does indicate the existence of correlations in the data 
but, as is well-known in the theory of Markov processes, 
such correlations, though extended, eventually decay. We 
distinguish the healthy subjects from those with CHF in 
terms of the differences between the drift and diffusion 
coefficients of the Fokker-Plank equation which, in our 
view, provides a clearer and more physical way of under- 
standing the differences between the two groups of the 
subjects. In addition, our method provides an unam- 
biguous way of reconstructing the data, hence providing 
a means to predict the behavior of the data over peri- 
ods of time that are on the order of the Markov time 
scale. Although it remains to be tested, we believe that 
our method is more sensitive to small differences between 
the data for the two groups of the subjects and, there- 
fore, might eventually provide a diagnostic tool for early 
detection of CHF in humans. 
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